Takakuni DOUSEKI Masashi YONEMARU Eiji IKUTA Akira MATSUZAWA Atsushi KAMEYAMA Shunsuke BABA Tohru MOGAMI Hakaru KYURAGI
This paper describes an ultralow-power multi-threshold (MT) CMOS/SOI circuit technique that mainly uses fully-depleted MOSFETs. The MTCMOS/SOI circuit, which combines fully-depleted low- and medium-Vth CMOS/SOI logic gates and high-Vth power-switch transistors, makes it possible to lower the supply voltage to 0.5 V and reduce the power dissipation of LSIs to the 1-mW level. We overview some MTCMOS/SOI digital and analog components, such as a CPU, memory, analog/RF circuit and DC-DC converter for an ultralow-power mobile system. The validity of the ultralow-voltage MTCMOS/SOI circuits is confirmed by the demonstration of a self-powered 300-MHz-band short-range wireless system. A 1-V SAW oscillator and a switched-capacitor-type DC-DC converter in the transmitter makes possible self-powered transmission by the heat from a hand. In the receiver, a 0.5-V digital controller composed of a 8-bit CPU, 256-kbit SRAM, and ROM also make self-powered operation under illumination possible.
This paper presents a Quiet-Noisy scan technique for low power delay fault testing. The novel scan cell design provides both the quiet and noisy scan modes. The toggling of scan cell outputs is suppressed in the quiet scan mode so the power is saved. Two-pattern tests are applied in the noisy scan mode so the delay fault testing is possible. The experimental data shows that the Quiet-Noisy scan technique effectively reduces the test power to 56% of that of the regular scan. The transition fault coverage is improved by 19.7% compared to an existing toggle suppression low power technique. The presented technique requires very minimal changes in the existing MUX-scan Design For Testability (DFT) methodology and needs virtually no computation. The penalties are area overhead, speed degradation, and one extra control in test mode.
In low-power embedded processors design, stand-by power constraint is also important with average power and operation frequency. Multi-threshold-voltage cells are used in the design and the ratio of low-Vth cells should be controlled. On the other hand, physical synthesis flow is indispensable to achieve high performance and short design time. This paper proposes a physical synthesis methodology under the restriction of maximum low-Vth cell ratio. The experimental results show that our method can achieve only 4 MHz slower logic within 5% margin of the target low-Vth ratio. We have applied this design flow in an application processor design and the designed processor demonstrates 360 MIPS at 200 MHz only with 80 mW at 1.0 V, namely 4500 MIPS/W and 4.2 mA leakage current without any power-cut mode.
Kugan VIVEKANANDARAJAH Thambipillai SRIKANTHAN Christopher T. CLARKE Saurav BHATTACHARYYA
Energy dissipation in cache memories is becoming a major design issue for embedded microprocessors. Predictive filter cache based instruction cache hierarchy has been shown to effectively reduce the energy-delay product. In this paper, a simplified pattern prediction algorithm is proposed for the filter cache hierarchy. The prediction scheme relies on the static nature of the hit or miss pattern of the instruction access streams. The static patterns are maintained in a small 32x1-bit wide Static Pattern Table (SPT). Our investigations show that the proposed prediction algorithm is superior to that based on Next Fetch Prediction Table (NFPT) for all the benchmarks simulated. With the proposed approach, energy delay product reduction of up to 6.79% was evident when compared with that using NFPT. Moreover, since the prediction scheme is based on the static assignment of patterns, it lends well for area and power efficient implementation than that employs dynamic pattern prediction although it is marginally inferior (i.e. 0.69%) in term of energy delay product.
Fumitoshi HATORI Hiroki ISHIKURO Mototsugu HAMADA Ken-ichi AGAWA Shouhei KOUSAI Hiroyuki KOBAYASHI Duc Minh NGUYEN
This paper describes a full-CMOS single-chip Bluetooth LSI fabricated using a 0.18 µm CMOS, triple-well, quad-metal technology. The chip integrates radio and baseband, which is compliant with Bluetooth Core Specification version 1.1. A direct modulation transmitter and a low-IF receiver architecture are employed for the low-power and low-cost implementation. To reduce the power consumption of the digital blocks, it uses a clock gating technique during the active modes and a power manager during the low power modes. The maximum power consumption is 75 mW for the transmission, 120 mW for the reception and 30 µW for the low power mode operation. These values are low enough for mobile applications. Sensitivity of -80 dBm has been achieved and the transmitter can deliver up to 4 dBm.
Hisakazu SATO Yasuhiro NUNOMURA Niichi ITOH Koji NII Kanako YOSHIDA Hironobu ITO Jingo NAKANISHI Hidehiro TAKATA Yasunobu NAKASE Hiroshi MAKINO Akira YAMADA Takahiko ARAKAWA Toru SHIMIZU Yuichi HIRANO Takashi IPPOSHI Shuhei IWADE
A low-power microcontroller has been developed with 0.10 µm bulk compatible body-tied SOI technology. For this work, only two new masks are required. For the other layers, existing masks of a prior work developed with 0.18 µm bulk CMOS technology can be applied without any changes. With the SOI technology, the high-speed operation of over 600 MHz has been achieved at a supply voltage of 1.2 V, which is 1.5 times faster than prior work. Also, a five times improvement in the power-delay product has been achieved at a supply voltage 0.8 V. Moreover, the compatibility of the SOI technology with bulk CMOS has been verified, because all circuit blocks of the chip, including logic, memory, analog circuit, and PLL, are completely functional, even though only two new masks are used.
Masanori MUROYAMA Akihiko HYODO Takanori OKUMA Hiroto YASUURA
To transfer a small number, we inherently need a small number of bits. However all bit lines on a data bus change their status and redundant power is consumed. To reduce the redundant power consumption, we introduce a concept named active bit. In this paper, we propose a power reduction scheme for data buses using active bits. Suppressing switching activity of inactive bits, we can reduce redundant power consumption. We propose various power reduction techniques using active bits and the implementation methods. Experimental results illustrate up to 54.2% switching activity reduction.
Masayuki MIYAMA Junichi MIYAKOSHI Kousuke IMAMURA Hideo HASHIMOTO Masahiko YOSHIMOTO
This paper describes a VLSI-oriented motion estimation algorithm using a steepest descent method (SDM) applied to MPEG-4 visual communication with a mobile terminal. The SDM algorithm is optimized for QCIF or CIF resolution video and VLSI implementation. The SDM combined with a subblock search method is developed to enhance picture quality. Simulation results show that a mean PSNR drop of the SDM algorithm processing QCIF 15 fps resolution video in comparison with a full search algorithm is -0.17 dB. Power consumption of a VLSI based on the SDM algorithm assuming 0.18 µm CMOS technology is estimated at 2 mW. The VLSI attains higher picture quality than that based on the other fast motion estimation algorithm, and is applicable to mobile video applications.
Yoshinobu HIGAMI Shin-ya KOBAYASHI Yuzo TAKAMATSU
When LSIs that are designed and manufactured for low power dissipation are tested, test vectors that make the power dissipation low should be applied. If test vectors that cause high power dissipation are applied, incorrect test results are obtained or circuits under test are permanently damaged. In this paper, we propose a method to generate test sequences with low power dissipation for sequential circuits. We assume test sequences generated by an ATPG tool are given, and modify them while keeping the original stuck-at fault coverages. The test sequence is modified by inverting the values of primary inputs of every test vector one by one. In order to keep the original fault coverage, fault simulation is conducted whenever one value of primary inputs is inverted. We introduce heuristics that perform fault simulation for a subset of faults during the modification of test vectors. This helps reduce the power dissipation of the modified test sequence. If the fault coverage by the modified test sequence is lower than that by the original test sequence, we generate a new short test sequence and add it to the modified test sequence.
Takaki YOSHIDA Masafumi WATARI
As semiconductor manufacturing technology advances, power dissipation and noise in scan testing have become critical problems. Our studies on practical LSI manufacturing show that power supply voltage drop causes testing problems during shift operations in scan testing. In this paper, we present a new testing method named MD-SCAN (Multi-Duty SCAN) which solves power supply voltage drop problems, as well as its experimental results applied to practical LSI chips.
As power consumption becoming a critical concern for System-On-a-Chip (SOC) design, accurate and efficient power analysis and estimation during the design phase at all levels of abstraction are becoming increasingly pressing in order to achieve low power without a costly redesign process. This paper surveys analysis and estimation techniques of dynamic power and leakage power for SOC design covering multiple design levels, which have been recently proposed, aiming to present a cohesive view of the power estimation techniques at all design levels of abstraction.
Takeshi FUJINO Akira YAMAZAKI Yasuhiko TAITO Mitsuya KINOSHITA Fukashi MORISHITA Teruhiko AMANO Masaru HARAGUCHI Makoto HATAKENAKA Atsushi AMO Atsushi HACHISUKA Kazutami ARIMOTO Hideyuki OZAKI
A low power 16 Mb embedded DRAM (eDRAM) macro is fabricated using 0.15 µm logic -based embedded DRAM process technology. A 0.5 µm2 CUB (
Satoshi KOMATSU Masahiro FUJITA
The power dissipation at the off-chip bus has become a significant part of the overall power dissipation in micro-processor based digital systems. This paper presents irredundant address bus encoding methods which reduce signal transitions on the instruction address buses by using adaptive codebook methods. These methods are based on the temporal locality and spatial locality of instruction address. Since applications tend to JUMP/BRANCH to limited sets of addresses, proposed encoding methods assign the least signal transition codes to the addresses of JUMP/BRANCH operations in the past. In addition, our methods can be easily applicable for conventional digital systems since they are irredundant encoding methods. Our encoding methods reduce the signal transitions on the instruction address buses, which results in the reduction of total power dissipation of digital systems. Experimental results show that our methods can reduce the signal transition by an average of 88%.
Christos DROSOS Chrissavgi DRE Spyridon BLIONAS Dimitrios SOUDRIS
The architecture and implementation of a novel processor suitable wireless terminal applications, is introduced. The wireless terminal is based on the novel dual-mode baseband processor for DECT and GSM, which supports both heterodyne and direct conversion terminal architectures and is capable to undertake all baseband signal processing, and an innovative direct conversion low power modulator/demodulator for DECT and GSM. The state of the art design methodologies for embedded applications and innovative low-power design steps followed for a single chip solution. The performance of the implemented dual mode direct conversion wireless terminal was tested and measured for compliance to the standards. The developed innovative terminal fulfils all the requirements and specifications imposed by the DECT and GSM standards.
Takashi HASHIMOTO Shunichi KUROMARU Masayoshi TOUJIMA Yasuo KOHASHI Masatoshi MATSUO Toshihiro MORIIWA Masahiro OHASHI Tsuyoshi NAKAMURA Mana HAMADA Yuji SUGISAWA Miki KUROMARU Tomonori YONEZAWA Satoshi KAJITA Takahiro KONDO Hiroki OTSUKI Kohkichi HASHIMOTO Hiromasa NAKAJIMA Taro FUKUNAGA Hiroaki TOIDA Yasuo IIZUKA Hitoshi FUJIMOTO Junji MICHIYAMA
A low power MPEG-4 video codec LSI with the capability for core profile decoding is presented. A 16-b DSP with a vector pipeline architecture and a 32-b arithmetic unit, eight dedicated hardware engines to accelerate MPEG-4 SP@L1 codec, CP@L1 decoding and post video processing, 20-Mb embedded DRAM, and three peripheral blocks are integrated together on a single chip. MPEG-4 SP@L1 codec, CP@L1 decoding and post video processing are realized with a hybrid architecture consisting of a programmable DSP and dedicated hardware engines at low operating frequency. In order to reduce the power consumption, clock gating technique is fully adopted in each hardware block and embedded DRAM is employed. The chip is implemented using 0.18-µm quad-metal CMOS technology, and its die area is 8.8 mm 8.6 mm. The power consumption is 90 mW at a SP@L1 codec and 110 mW at a CP@L1 decoding.
Seongmo PARK Miyoung LEE KyoungSeon SHIN Hanjin CHO Jongdae KIM Dukdong LEE
In this paper, we present a design of MPEG-4 video codec chip to reduce the power consumption using frame level clock gating, macro block level and motion estimation skip scheme. It performs 30 frames/s of codec (encoding and decoding) mode with quarter-common intermediate format (QCIF) at 27 MHz. Power consumption is 290 mW at 27 MHz operation, which is achieving 35% power saving compared to a conventional CMOS. Motion Estimation skip method is employed to reduce 32% computation load. This chip performs MPEG-4 Simple Profile Level 2 (Simple@L2) and H. 263 base mode. Its contains 388,885 gates, 662 k bits memory, and the chip size was 9.7 mm9.7 mm which was fabricated using 0.35 micron 3-layers metal CMOS technology.
Koji INOUE Vasily G. MOSHNYAGA Kazuaki MURAKAMI
In this paper, we propose an instruction encoding scheme to reduce power consumption of instruction ROMs. The power consumption of the instruction ROM strongly depends on the switching activity of bit-lines due to their large load capacitance. In our approach, the binary-patterns to be assigned as op-codes are determined based on the frequency of instructions in order to reduce the number of bit-line dis-charging. Simulation results show that our approach can reduce 40% of bit-line switchings from a conventional organization.
Hiromi NOTANI Masayuki KOYAMA Ryuji MANO Hiroshi MAKINO Yoshio MATSUDA Osamu TOMISAWA Shuhei IWADE
A 64-bit 100-MHz multimedia DSP core has been designed using 0.15-µ m CMOS technology. An improved Auto-Backgate-Controlled MT-CMOS (ABC-MT-CMOS) circuit with a charge pump is adopted to suppress the standby leakage current. The dynamic active current of whole chip was simulated to optimize the size of a switch for a power supply control. The DSP core chip, which integrates 300-kgate Logic, 64-kbyte SRAM and charge pump circuit, has 8-µ A standby leakage current. The reduction rate is 1/250.
In this paper, two low power hardware structures essential for MPEG-4 video codec are proposed for portable applications. First, an adaptive bit resolution control (ABRC) scheme is proposed for a processing element (PE) in a systolic-array type motion estimator (ME). By appropriately modifying the datapath of PE to exploit the correlations in pixel values, its structure is optimized in terms of both hardware cost and low power consumption. As a result, power is saved up to 29% compared with a conventional PE while the computation accuracy is preserved and the overhead is kept negligible. Second, a low power motion compensation (MC) accelerator is proposed. By embedding DRAM whose structure is optimized for low power consumption, the power consumption for external data I/Os is dramatically reduced. In addition, distributed nine-tiled block mapping (DNTBM) with partial activation scheme in the frame buffer reduces the power for accessing frame buffer up to 31% compared to a conventional 1-bank tiled mapping. With the proposed MC accelerator, MPEG-4 SP@L1 decoding system is fabricated using 0.18 µm embedded memory logic (EML) technology.
Hiroshi TAKAHASHI Rimon IKENO Yutaka TOYONOH Akihiro TAKEGAMA Yasumasa IKEZAKI Tohru URASAKI Hitoshi SATOH Masayasu ITOIGAWA Yoshinari MATSUMOTO
High-speed and low-power DSPs have been developed for versatile hand set applications. The DSP contains a 16-bit fixed point DSP core with multiple buses, highly tuned instruction sets and a low-power architecture, featuring CPU power with 404.5 µ W/MHz, chip power with 2.08 mW/MHz at peak and 200 µA stand-by current and 160 MHz/160 MIPS performance by a single DSP core, and also operates at 0.68 V within the temperature range from -40C to 125C in the worst case (Weak corner) even using much higher I-off current process compared to a conventional process to obtain a faster operating frequency. In this paper, we discuss circuit design techniques to continue scaling down valuable IP cores keeping the same functionality, better speed performance, and lower power dissipation with much lower voltage operation capability. For further power reduction by DSP software, Run-time Power Control (RPC) has been demonstrated in an MP3 player using 100 MHz/100 MIPS DSP at 1.8 V, which is a real-time application running on an Internet audio evaluation module experimentally and we obtained 32-60% power reduction on various music source data.